Working Women: A study on female
participation in the labor force around the world
Historically, women around the world tend to face barriers when entering
and staying in the workforce. Using data from around the world, I will
compare labor rates in different countries and look at potentional
corresponding factors.
Some of the research questions I will explore include:
How do female participation rates vary from country to country?
What variables in the data set correlate with female participation rates?
Do other variables, such as life expectancy and region, relate to each other?
The data used in this analysis is from The World Bank. I used the gender section to find variables related to gender differences in working levels. This dashboard will mainly focus on data from 2020, the most recent complete year of reporting.
There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.
Country: the country of the observation
-Not all countries are represented, and some have more data than
others
Year: the year of the observation
-The numerical variables female & male life expectancy and fertility
rate have data for many countries back to 1960.
-The variables female & male participation rate and female
percentage of the labor force have data starting at 1990.
Region: the region of the country
-There are 7 regions
Income Level: the income level of the
country
-There are 4 income levels
-According to the World Bank, “the classifications are updated each year
on July 1 and are based on the GNI (Gross National Income) per capita of
the previous year.” More about the income classification can be found here.
Male Life Expectancy: life expectancy at birth, male (years)
Female Life Expectancy: life expectancy at birth, female (years)
Fertility Rate: Number of children born per woman on average (births per woman)
Female Labor: Female labor force as a proportion
of the total labor force (percentage)
-Shows how active women are in relation to others in the labor
force
-The labor force is made up of people 15 or older that supply
labor
Female Participation: Rate of women ages 15 or older that supply labor (percentage)
Male Participation: Rate of men ages 15 or older that supply labor (percentage)
In both the summary statistics and correlation tabs, only data from 2020 will be used.
Summary Statistics
The summary statistics tab shows information about each of the variables in the data set.
The number of countries in each region and income group are shown at the top.
The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.
Female life expectancy is higher on average than male life expectancy
The male participation rate tends to be higher than the female participation rate
Both the female percentage of labor force and the female participation rate have a large amount of variation in the data
Correlation Plot
The correlation plot shows relationships between the numerical variables in the data set.
Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.
Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.
Female participation and female labor are very strongly positively correlated as well. As the percentage of women working rises, the percentage of the workforce that is female tends to rise.
Female participation is not strongly correlated with any of the other values in the data set.
In the next few tabs, I will explore the relationship between female participation and region and income.
Categorical
Variables
Region Income Group
East Asia & Pacific :37 Low income :28
Europe & Central Asia :58 Lower middle income:54
Latin America & Caribbean :42 Upper middle income:54
Middle East & North Africa:21 High income :80
North America : 3 NA's : 1
South Asia : 8
Sub-Saharan Africa :48
Numerical Variables
| Variable | Min | Mean | Max | Missing Values (%) |
|---|---|---|---|---|
| Male Life Expectancy | 51.45 | 70.57 | 82.9 | 8.29 |
| Female Life Expectancy | 55.88 | 75.47 | 88 | 8.29 |
| Fertility Rate | 0.84 | 2.57 | 6.74 | 7.83 |
| Female Percentage of Labor Force | 8.27 | 41.17 | 54.91 | 13.82 |
| Male Participation Rate | 44.24 | 69.2 | 95.44 | 13.82 |
| Female Participation Rate | 6.08 | 49.69 | 83.05 | 13.82 |
The table below shows the countries and corresponding variables
from 2020.
Next, I looked at the distribution of female participation. It follows close to a normal distribution, but is skewed slightly to the left.
The mean value for female participation is 49.69%.
The country with the smallest percentage is Yemen, Rep. at 6.08%. Yemen is in the region category of Middle East & North Africa and is classified as low income.
The country with the highest participation is Solomon Islands at a rate of 83.05%. The Solomon Islands are classified as lower middle income and located in the East Asia & Pacific region.
Region
Region is displayed as a map by country for the 2020 values (Exploration
tab) and the average participation by region over time.
On both plots it is shown that the Middle East & North Africa have the lowest participation rates, while Sub-Saharan Africa and North America have the highest rates.
The rates in Latin America & Caribbean and the Middle East & North Africa have changed the most in the past 30 years, with both regions seeing an increase between 5-10%.
The gap between the Middle East & North Africa and Sub-Saharan Africa is around 30% as of 2020.
Income
The distribution by income shows some interesting results.
The median differences in income levels are not as drastic as the differences in region levels.
Low income countries have the highest average rate of female participation, followed by high income, upper middle income, and lower middle income.
The category lower middle income has the largest spread of data.
It is surprising that the two ends of the spectrum have the highest median rates of female participation
Female Percentage
Female percentage of the labor force highly correlates with female
participation.
Aside from a few outliers, most points tend to fall fairly close to the regression line
The high correlation mathematically makes sense because as the proportion of women working increases, so should the female percentage of the labor force.
Region and female percentage are two of the top predictors for female participation. Female participation rates vary widely by region groups and are strongly correlated with the female percentage of the labor force.
After looking at several correlations, region became the one with the strongest variation among different variables. This section focuses on some of the regional differences.
Fertility Rate
The fertility rate in Africa is much higher than the rest of the
world.
Sub-Saharan Africa has an average rate of 4.24 births/woman, almost 2 births higher than any other region in the world.
South Korea has the lowest fertility rate with an average of 0.84 births/woman. Niger has the highest at 6.74.
Income
Income varies widely by region.
North America has the highest proportion of high income countries (all), followed by Europe & Central Asia at about 66%.
Half of the countries in Sub-Saharan Africa classify as low income, and South Asia has the next highest proportion of low income (12%).
The regions East Asia & Pacific, Middle East & North Africa, and Sub-Saharan Africa have at least one country per income level.
Female Life Expectancy
Female life expectancy has increased over time in all countries.
As of 2020, North America has the highest female life expectancy at 83.4 years, and Sub-Saharan Africa has the lowest with a value of 65.3 years.
South Asia’s life expectancy has increased the most, starting at 41.8 years and moving to 73.5 years (+31.7 years).
North America has small dips in the average life expectancy because Bermuda only has data for several years from 1960-2000, bringing the average down.
The rate of women that participate in the workforce varies drastically around the world, and many variables affect the female participation rates in different countries. After studying many of them, I learned that some hold more weight than others. Region, income, and female percentage of the labor force were the three variables that correlated strongest with female participation rates. In addition, region was a strong predictor for many of the variables in the data set, including fertility rate and life expectancy.
The data available placed limitations on this study. There were many more numerical variables I could have used in my analysis, however, many of them had missing values for many countries. I focused my study on the relevant variables with the most data.
I made several assumptions throughout the analysis.
The missing values for each variable would not have a dramatic impact on my results. In 2020, there was no data for female participation from 14% of countries around the world. While may of these countries were small, I still was forced to exclude them in my results.
When grouping by region over time, not all countries had data going back as far as others. In addition, I did not weight the averages by population, something that could be improved upon in the future.
I learned many things when putting this project together, both about R Markdown and labor participation rates. My biggest takeaway from my analysis is how variables were strongly grouped by region. Countries in the same region were much more likely to have similar characteristics than any other type of indicator.
While I narrowed this study down to 8 variables, much work could still be done on the remaining variables. It would be interesting to see the effect education had on female participation rates, as well as the difference between male and female participation rates. In some regions the gender gap is shrinking, and an analysis on the reasons behind the closing gap would add context to my presentation.
Another way to study this data could be by country or region. I choose to give a broad overview of the world focusing on the year 2020, but looking at a specific region could add insight.
All data used in this project came from The World Bank. This was a great resource for access to lots of real data with applications. In addition to the data, the world bank has other resources that I used to help better understand the data after performing some initial analysis.
A resource I found very helpful was iMediaProf - his Youtube video taught me how to embed a Tableau view into my markdown file.
A big thank you to Dr. Chen for all her teaching and guidance!
---
title: "Working Women"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
source_code: embed
theme:
bootswatch: zephyr
---
```{r setup, include=FALSE}
library(flexdashboard)
```
```{r imports}
setwd("C:/Users/clari/Documents/School/Classes/MTH 209/final project")
library(pacman)
p_load(tidyverse, ggplot2, RColorBrewer, DataExplorer, vtable, scales)
#reading in files
gender <- read_csv("data/gender.csv", skip = 4)
colnames(gender) <- mapply(gsub, 'X', '', colnames(gender), USE.NAMES = FALSE)
gender <- gender %>% rename(country_code = "Country Code", country_name = "Country Name", ind_code = "Indicator Code", ind_name = "Indicator Name")
region_income <- read_csv("data/region_income_level.csv")
region_income <- region_income %>% rename(country_code = "Country Code", region = "Region", income_group = "IncomeGroup") %>%
select(country_code, region, income_group)
region_income <- region_income %>% subset(!is.na(country_code)) %>% subset(nchar(country_code) == 3)
#Creating data frame with wanted variables
indicator_names = c("m_life_exp","f_life_exp", "fertility_rate", "female_labor", "male_participation", "female_participation")
df <- gender %>% mutate(indicator = case_when(
ind_code == "SP.DYN.LE00.MA.IN" ~ indicator_names[1],
ind_code == "SP.DYN.LE00.FE.IN" ~ indicator_names[2],
ind_code == "SP.DYN.TFRT.IN" ~ indicator_names[3],
ind_code == "SL.TLF.TOTL.FE.ZS" ~ indicator_names[4],
ind_code == "SL.TLF.CACT.MA.ZS" ~ indicator_names[5],
ind_code == "SL.TLF.CACT.FE.ZS" ~ indicator_names[6]
))
df <- subset(df, !is.na(indicator))
df <- df %>% select(-c(ind_name, ind_code)) %>% select("indicator", "country_name", "country_code", everything())
df <- data.frame(country = rep(unique(df$country_name), 62),
country_code = rep(unique(df$country_code), 62),
year = rep(1960:2021, each = length(unique(df$country_name))),
m_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[1], 4:65]))),
f_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[2], 4:65]))),
fertility_rate = unname(unlist(as.vector(df[df$indicator==indicator_names[3], 4:65]))),
female_labor = unname(unlist(as.vector(df[df$indicator==indicator_names[4], 4:65]))),
male_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[5], 4:65]))),
female_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[6], 4:65])))
)
df <- df %>% left_join(region_income, by = "country_code") %>%
select(country, year, country_code, region, income_group, everything())
df <- df %>% mutate_if(is.character, as.factor)
df$income_group <- factor(df$income_group, levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))
#Taking out data that was not grouped by individual country
data_2020 <- df %>% subset(year == "2020") %>% select(-c("year")) %>% subset(!is.na(region))
#Averages by region and year
region_groups <- df %>% group_by(year, region) %>% summarise(avg_female_labor = mean(female_labor, na.rm = T), avg_female_le = mean(f_life_exp, na.rm = T), avg_female_participation = mean(female_participation, na.rm = T))
```
Data Introduction
=======================================================================
Column {.tabset data-width=600 .tabset-fade}
-----------------------------------------------------------------------
### Motivation and Background
<font size="5"> **Working Women: A study on female participation in the labor force around the world**</font>
Historically, women around the world tend to face barriers when entering and staying in the workforce. Using data from around the world, I will compare labor rates in different countries and look at potentional corresponding factors.
Some of the research questions I will explore include:
- How do female participation rates vary from country to country?
- What variables in the data set correlate with female participation rates?
- Do other variables, such as life expectancy and region, relate to each other?
The data used in this analysis is from [The World Bank](https://genderdata.worldbank.org/). I used the gender section to find variables related to gender differences in working levels. This dashboard will mainly focus on data from 2020, the most recent complete year of reporting.
### Variable Explanations
There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.
- **Country**: the country of the observation
-Not all countries are represented, and some have more data than others
- **Year**: the year of the observation
-The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.
-The variables female & male participation rate and female percentage of the labor force have data starting at 1990.
- **Region**: the region of the country
-There are 7 regions
- **Income Level**: the income level of the country
-There are 4 income levels
-According to the World Bank, "the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year." More about the income classification can be found [here](https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2022-2023#).
- **Male Life Expectancy**: life expectancy at birth, male (years)
- **Female Life Expectancy**: life expectancy at birth, female (years)
- **Fertility Rate**: Number of children born per woman on average (births per woman)
- **Female Labor**: Female labor force as a proportion of the total labor force (percentage)
-Shows how active women are in relation to others in the labor force
-The labor force is made up of people 15 or older that supply labor
- **Female Participation**: Rate of women ages 15 or older that supply labor (percentage)
- **Male Participation**: Rate of men ages 15 or older that supply labor (percentage)
### Analysis
In both the summary statistics and correlation tabs, only data from 2020 will be used.
**Summary Statistics**
The summary statistics tab shows information about each of the variables in the data set.
The number of countries in each region and income group are shown at the top.
The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.
- Female life expectancy is higher on average than male life expectancy
- The male participation rate tends to be higher than the female participation rate
- Both the female percentage of labor force and the female participation rate have a large amount of variation in the data
----------------------------------------------------------------
**Correlation Plot**
The correlation plot shows relationships between the numerical variables in the data set.
- Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.
- Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.
- Female participation and female labor are very strongly positively correlated as well. As the percentage of women working rises, the percentage of the workforce that is female tends to rise.
- Female participation is not strongly correlated with any of the other values in the data set.
In the next few tabs, I will explore the relationship between female participation and region and income.
Column {.tabset data-width=400 .tabset-fade}
-----------------------------------------------------------------------
### Summary Statistics
<br>
<span style="color: light grey;">Categorical Variables</span>
``` {r summary_cat}
region_income_table <- summary(data_2020 %>% select(region, income_group))
colnames(region_income_table) <- c("Region", "Income Group")
region_income_table
```
<span style="color: light grey;">Numerical Variables</span>
``` {r summary_num}
labs <- c('Male Life Expectancy',
'Female Life Expectancy',
'Fertility Rate',
'Female Percentage of Labor Force',
'Male Participation Rate',
'Female Participation Rate')
st(data_2020 %>% select(-c("region", "income_group", "country", "country_code")),
summ=c('min(x)',
'mean(x)',
'max(x)',
'propNA(x)*100'),
summ.names = c('Min',
'Mean',
'Max',
'Missing Values (%)'),
title = "",
digits = 2,
labels = labs)
```
### Correlation
``` {r correlation}
corr <- data_2020 %>% select(-c("region", "income_group", "country", "country_code"))
plot_correlation(corr, cor_args = list("use" = "complete.obs"))
```
Exploration
=======================================================================
Column {.tabset .tabset-fade}
----------------------------------------------------------------------
### Data Table
<br>
The table below shows the countries and corresponding variables from 2020.
<br>
``` {r view}
DT::datatable(df %>% filter(year == "2020", !is.na(region))) %>%
DT::formatRound(columns=c("female_labor", "male_participation", "female_participation"), digits=3)
```
### Worldwide Map
<div class='tableauPlaceholder' id='viz1669924235929' style='position: relative'><noscript><a href='#'><img alt='Female Participation ' src='https://public.tableau.com/static/images/Fe/FemaleParticipationWorldwide/FemaleParticipation/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='FemaleParticipationWorldwide/FemaleParticipation' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/Fe/FemaleParticipationWorldwide/FemaleParticipation/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>
``` {js, embedcode}
var divElement = document.getElementById('viz1669924235929');
var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.4)+'px';
var scriptElement = document.createElement('script');
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';
vizElement.parentNode.insertBefore(scriptElement, vizElement);
```
Female Participation
=======================================================================
Column {.tabset data-width=550 .tabset-fade}
----------------------------------------------------------------------
### Histogram
Next, I looked at the distribution of female participation. It follows close to a normal distribution, but is skewed slightly to the left.
``` {r f_hist}
ggplot(data_2020, aes(x= female_participation)) + geom_histogram(na.rm=T, binwidth = 5, col = "white", fill = "#1b2085") + labs(x = "Female Participation (%)", y = "Number of Countries", title = "Distribution of Female Participation in the Labor Force")
```
<br>
The mean value for female participation is `r round(mean(data_2020$female_participation, na.rm = T),2)`%.
The country with the smallest percentage is `r data_2020[which.min(data_2020$female_participation), "country"]` at `r round(data_2020[which.min(data_2020$female_participation), "female_participation"],2)`%. Yemen is in the region category of `r data_2020[which.min(data_2020$female_participation), "region"]` and is classified as `r tolower(data_2020[which.min(data_2020$female_participation), "income_group"])`.
The country with the highest participation is `r data_2020[which.max(data_2020$female_participation), "country"]` at a rate of `r round(data_2020[which.max(data_2020$female_participation), "female_participation"],2)`%. The `r data_2020[which.max(data_2020$female_participation), "country"]` are classified as `r tolower(data_2020[which.max(data_2020$female_participation), "income_group"])` and located in the `r data_2020[which.max(data_2020$female_participation), "region"]` region.
### Plot Analysis
**Region**
Region is displayed as a map by country for the 2020 values (Exploration tab) and the average participation by region over time.
- On both plots it is shown that the Middle East & North Africa have the lowest participation rates, while Sub-Saharan Africa and North America have the highest rates.
- The rates in Latin America & Caribbean and the Middle East & North Africa have changed the most in the past 30 years, with both regions seeing an increase between 5-10%.
- The gap between the Middle East & North Africa and Sub-Saharan Africa is around 30% as of 2020.
---------------------------------
**Income**
The distribution by income shows some interesting results.
- The median differences in income levels are not as drastic as the differences in region levels.
- Low income countries have the highest average rate of female participation, followed by high income, upper middle income, and lower middle income.
- The category lower middle income has the largest spread of data.
- It is surprising that the two ends of the spectrum have the highest median rates of female participation
---------------------------------------
**Female Percentage**
Female percentage of the labor force highly correlates with female participation.
- Aside from a few outliers, most points tend to fall fairly close to the regression line
- The high correlation mathematically makes sense because as the proportion of women working increases, so should the female percentage of the labor force.
----------------------------------------
Region and female percentage are two of the top predictors for female participation. Female participation rates vary widely by region groups and are strongly correlated with the female percentage of the labor force.
Column {.tabset data-width=450 .tabset-fade}
--------------------------------------------------------------------
### Region
``` {r region_1}
ggplot(region_groups, aes(x = year, y = avg_female_participation, groups = region, col = region)) + geom_line(na.rm = T, linewidth = 1.2) + xlim(1988, 2022) + scale_color_brewer(palette = "Set2", na.translate = FALSE) + theme(legend.position="bottom") + guides(colour = guide_legend(title.position = "top")) + labs(title = "Average Female Participation by Region from 1990-2020", x = "Year", y = "Average Female Participation (%)", col = "Region") + theme(text = element_text(size=10))
```
### Income
``` {r income}
ggplot(data_2020, aes(x=income_group, y = female_participation)) + geom_boxplot(na.rm = TRUE, fill = "#4F8073") +
scale_x_discrete(na.translate = FALSE) + labs(x = "Income Level", y = "Female Participation (%)", title = "Female Participation Distribution by Income")
```
### Female Percentage
``` {r f_perc}
ggplot(data_2020, aes(x = female_labor, y=female_participation )) + geom_point(na.rm = TRUE, col = "#0076C7") +
geom_smooth(na.rm = TRUE, se = FALSE, col = "black") + labs(x = "Labor Force - Female (%)", y = "Female Participation (%)", title = "Female Percent vs. Female Participation of the Labor Force")
```
Regional Correlations
=======================================================================
Column {.tabset data-width=500 .tabset-fade}
-----------------------------------------------------------------------
### Fertility Rate
``` {r r_map}
library(maps)
map <- map_data("world")
#Recoding names to match data set
map$region <- map$region %>% recode("USA" = "United States",
"Venezuela" = "Venezuela, RB",
"Egypt" = "Egypt, Arab Rep.",
"Iran" = "Iran, Islamic Rep.",
"North Korea" = "Korea, Dem. People's Rep.",
"South Korea" = "Korea, Rep.",
"Turkey" = "Turkiye",
"Yemen" = "Yemen, Rep.",
"Laos" = "Lao PDR",
"Russia" = "Russian Federation",
"Syria" = "Syrian Arab Republic",
"Democratic Republic of the Congo" = "Congo, Dem. Rep.",
"Republic of Congo" = "Congo, Rep.",
"French Guiana" = "Guyana",
"Kyrgyzstan" = "Kyrgyz Republic",
"Ivory Coast" = "Cote d'Ivoire",
"Virgin Islands" = "Virgin Islands (U.S.)",
"Saint Vincent" = "St. Vincent and the Grenadines",
"Trinidad" = "Trinidad and Tobago",
"Sint Maarten" = "Sint Maarten (Dutch part)",
"Slovakia" = "Slovak Republic",
"Gambia" = "Gambia, The",
"UK" = "United Kingdom",
"Saint Martin" = "St. Martin (French part)",
"Saint Lucia" = "St. Lucia",
"Antigua" = "Antigua and Barbuda",
"Bahamas" = "Bahamas, The"
)
gender_map <- data_2020 %>% left_join(map, by = c("country"="region"))
```
``` {r fertility_map, fig.height = 7, fig.width = 12}
ggplot(gender_map, aes(long, lat, group = group)) + geom_polygon(aes(fill = fertility_rate), color = "white") + scale_fill_viridis_c(option = "D") + theme_void() + labs(fill = "Fertility Rate") + theme(legend.text = element_text(size=14), legend.title = element_text(size=16), legend.position = "bottom")
```
### Income
``` {r income_region}
ggplot(data_2020, aes(x = region, fill = income_group)) + geom_bar(position = "fill", na.rm = TRUE) + scale_x_discrete(na.translate = FALSE, labels = label_wrap(12)) + scale_y_continuous(breaks = seq(0,1,by = .2), labels = percent) + scale_fill_manual(values = c("#472d30", "#723d46", "#ad2a56", "#ba8466", "#03071e")) + theme(legend.position="top") + labs(title = "Income Levels by Region", x = "Region", y = "Percentage", fill = "") + theme(text = element_text(size=10))
```
### Female Life Expectancy
``` {r f_le_region}
ggplot(region_groups, aes(x = year, y = avg_female_le, groups = region, col = region)) + geom_line(na.rm = T, linewidth = 1.2) + scale_color_brewer(palette = "Set2", na.translate = FALSE) +
theme(legend.position="bottom") + guides(colour = guide_legend(title.position = "top")) + labs(title = "Female Life Expectancy by Region from 1960-2020", x = "Year", y = "Average Female Life Expectancy (years)", col = "Region") + theme(text = element_text(size=10))
```
Column {data-width=500}
-----------------------------------------------------------------------
### Analysis
After looking at several correlations, region became the one with the strongest variation among different variables. This section focuses on some of the regional differences.
-----------
**Fertility Rate**
The fertility rate in Africa is much higher than the rest of the world.
- Sub-Saharan Africa has an average rate of 4.24 births/woman, almost 2 births higher than any other region in the world.
- South Korea has the lowest fertility rate with an average of 0.84 births/woman. Niger has the highest at 6.74.
------------
**Income**
Income varies widely by region.
- North America has the highest proportion of high income countries (all), followed by Europe & Central Asia at about 66%.
- Half of the countries in Sub-Saharan Africa classify as low income, and South Asia has the next highest proportion of low income (12%).
- The regions East Asia & Pacific, Middle East & North Africa, and Sub-Saharan Africa have at least one country per income level.
-------------
**Female Life Expectancy**
Female life expectancy has increased over time in all countries.
- As of 2020, North America has the highest female life expectancy at 83.4 years, and Sub-Saharan Africa has the lowest with a value of 65.3 years.
- South Asia's life expectancy has increased the most, starting at 41.8 years and moving to 73.5 years (+31.7 years).
- North America has small dips in the average life expectancy because Bermuda only has data for several years from 1960-2000, bringing the average down.
Conclusions
=======================================================================
Column {data-width=500}
-----------------------------------------------------------------------
### Summary
The rate of women that participate in the workforce varies drastically around the world, and many variables affect the female participation rates in different countries. After studying many of them, I learned that some hold more weight than others. Region, income, and female percentage of the labor force were the three variables that correlated strongest with female participation rates. In addition, region was a strong predictor for many of the variables in the data set, including fertility rate and life expectancy.
The data available placed limitations on this study. There were many more numerical variables I could have used in my analysis, however, many of them had missing values for many countries. I focused my study on the relevant variables with the most data.
I made several assumptions throughout the analysis.
- The missing values for each variable would not have a dramatic impact on my results. In 2020, there was no data for female participation from 14% of countries around the world. While may of these countries were small, I still was forced to exclude them in my results.
- When grouping by region over time, not all countries had data going back as far as others. In addition, I did not weight the averages by population, something that could be improved upon in the future.
I learned many things when putting this project together, both about R Markdown and labor participation rates. My biggest takeaway from my analysis is how variables were strongly grouped by region. Countries in the same region were much more likely to have similar characteristics than any other type of indicator.
Column {data-width=500}
-----------------------------------------------------------------------
### Future Work
While I narrowed this study down to 8 variables, much work could still be done on the remaining variables. It would be interesting to see the effect education had on female participation rates, as well as the difference between male and female participation rates. In some regions the gender gap is shrinking, and an analysis on the reasons behind the closing gap would add context to my presentation.
Another way to study this data could be by country or region. I choose to give a broad overview of the world focusing on the year 2020, but looking at a specific region could add insight.
### References
All data used in this project came from [The World Bank](https://genderdata.worldbank.org/). This was a great resource for access to lots of real data with applications. In addition to the data, the world bank has other [resources](https://genderdata.worldbank.org/data-stories/flfp-data-story/) that I used to help better understand the data after performing some initial analysis.
A resource I found very helpful was [iMediaProf](https://www.youtube.com/watch?v=yBIfRS56gjo) - his Youtube video taught me how to embed a Tableau view into my markdown file.
A big thank you to Dr. Chen for all her teaching and guidance!